Word sense disambiguation for arabic text categorization

نویسندگان

Meryeme Hadni

Saïd El Alaoui Ouatik

Abdelmonaime Lachkar

چکیده

In this paper, we present two contributions for Arabic Word Sense Disambiguation. In the first one, we propose to use both two external resources AWN and WN based on Term to Term Machine Translation System (MTS). The second contribution relates to the disambiguation strategies, it consists of choosing the nearest concept for the ambiguous terms, based on more relationships with different concepts in the same local context. To evaluate the accuracy of our proposed method, several experiments have been conducted using Feature Selection methods; Chi-Square and CHIR, and two Machine Learning techniques; the Naïve Bayesian (NB) and Support Vector Machine (SVM). The obtained results illustrate that using the proposed method increases greatly the performance of our Arabic Text Categorization System.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language Processing based Soft Computing Techniques

This paper presents the implementation of soft computing (SC) techniques in the field of natural language processing. An attempt is made to design and implement an automatic tagger that extract a free text and then tag it. The part of speech taggers (POS) is the process of categorization words based on their meaning, functions and types (noun, verb, adjective, etc). Two stages tagging system ba...

متن کامل

The Role of Word Sense Disambiguation in Automated Text Categorization

Automated Text Categorization has reached the levels of accuracy of human experts. Provided that enough training data is available, it is possible to learn accurate automatic classifiers by using Information Retrieval and Machine Learning Techniques. However, performance of this approach is damaged by the problems derived from language variation (specially polysemy and synonymy). We investigate...

متن کامل

The learning vector quantization algorithm applied to automatic text classification tasks

Automatic text classification is an important task for many natural language processing applications. This paper presents a neural approach to develop a text classifier based on the Learning Vector Quantization (LVQ) algorithm. The LVQ model is a classification method that uses a competitive supervised learning algorithm. The proposed method has been applied to two specific tasks: text categori...

متن کامل

Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

Wednesday, June 15th 8:00 Conference Registration (Registration desk) 8:45 Session 1: Large-Scale Online Linguistic Resources (I) Chair: "Text Categorization Based on Subtopic Clusters" Francis Chik, Robert Luk, Korris Chung "Automatic Filtering of Bilingual Corpora for Statistical Machine Translation" Shahram Khadivi, Hermann Ney "The Role of Word Sense Disambiguation in Automated Text Categor...

متن کامل

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models

There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais et al., 1988), word sense disambiguation (Schütze, 1993), various semantic knowledge tests (Lund et al., 1995; Karlgren and Sahlgren, 2001), and text categorization (Sahlgren and Karlgren, 2005). Based on the assumption that each model captures some...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Int. Arab J. Inf. Technol.

دوره 13 شماره

صفحات -

تاریخ انتشار 2016

Word sense disambiguation for arabic text categorization

نویسندگان

چکیده

منابع مشابه

Natural Language Processing based Soft Computing Techniques

The Role of Word Sense Disambiguation in Automated Text Categorization

The learning vector quantization algorithm applied to automatic text classification tasks

Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models

عنوان ژورنال:

اشتراک گذاری